Model Selection

Lightweight Inference

# Lightweight Inference

Deepseek Ai DeepSeek R1 Distill Qwen 14B GGUF

DeepSeek-R1-Distill-Qwen-14B is an optimized large language model with a parameter scale of 14B, released by DeepSeek AI. It is distilled from the Qwen architecture and offers multiple GGUF quantization versions to improve performance.

Large Language Model

featherless-ai-quants

Magma-8B is an image-text-to-text conversion model based on the GGUF format, suitable for multimodal task processing.

Qwen3 1.7B GGUF

Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and mixture of experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model English

GLM 4 9B 0414 GGUF

GLM-4-9B-0414 is a lightweight member of the GLM family with 9 billion parameters, excelling in mathematical reasoning and general tasks, providing an efficient solution for resource-constrained scenarios.

Large Language Model Supports Multiple Languages

Qwen3 8B Q4 K M GGUF

This is the GGUF format version of the Qwen3-8B model, suitable for the llama.cpp framework and supports text generation tasks.

Large Language Model

Gemma 2 9b It Abliterated GGUF

A quantized version based on Gemma 2.9B, optimized using llama.cpp, suitable for running in LM Studio.

Large Language Model English

Phi 4 Mini Instruct.gguf

Phi-4-mini-instruct is a lightweight open-source model focused on high-quality, reasoning-rich data, supporting a context length of 128K tokens.

Large Language Model Other

3b Zh Ft Research Release Q8 0 GGUF

This model is converted from canopylabs/3b-zh-ft-research_release into GGUF format, suitable for Chinese text generation tasks.

Large Language Model Chinese

Google Gemma 3 1b It Qat GGUF

Multiple quantized versions based on Google Gemma 3B QAT weights, suitable for local inference deployment

Large Language Model

Google Gemma 3 12b It Qat GGUF

Gemma-3-12b model based on Google QAT (Quantization-Aware Training) weight quantization, offering multiple quantized versions to accommodate different hardware requirements.

Large Language Model

GLM-4-9B-0414 is a lightweight member of the GLM family with 9 billion parameters, demonstrating outstanding capabilities in mathematical reasoning and general tasks, ranking among the top in open-source models of similar scale.

Large Language Model

Transformers Supports Multiple Languages

Orpheus 3b 0.1 Ft Q8 0 GGUF

This model is converted from canopylabs/orpheus-3b-0.1-ft into GGUF format, suitable for text generation tasks.

Large Language Model English

Orpheus 3b 0.1 Ft Q2 K.gguf

This model is a GGUF format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.

Large Language Model English

Orpheus 3b 0.1 Ft Q4 K M GGUF

This model is a GGUF-format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.

Large Language Model English

Deepseek V3 5layer

A simplified 5-layer version of DeepSeek-V3 for lightweight tasks and rapid experimentation.

Large Language Model

Arrowmint Gemma3 4B YUKI V0.1

A Japanese language model optimized for AI Virtual YouTuber (AI VTuber) conversations, developed based on Google's gemma-3-4b-it

Large Language Model Supports Multiple Languages

Orpheus 3b 0.1 Ft Q4 K M GGUF

GGUF quantized version of Orpheus-3B-0.1-FT, suitable for efficient inference

Large Language Model English

Gemma 3 4b It GGUF

This model is converted from google/gemma-3-4b-it to GGUF format via llama.cpp, suitable for local deployment and inference.

Large Language Model

Bge Reranker V2 M3 Q5 K M GGUF

This model is converted from BAAI/bge-reranker-v2-m3 into GGUF format using llama.cpp via ggml.ai's GGUF-my-repo space, primarily for text classification tasks.

Text Embedding Other

Orpheus 3b 0.1 Ft Q2 K GGUF

This is a GGUF format model converted from the canopylabs/orpheus-3b-0.1-ft model, suitable for text generation tasks.

Large Language Model English

Phi 4 Mini Instruct Abliterated

Phi-4-mini-instruct is a lightweight open-source model built on synthetic data and curated public websites, focusing on high-quality data with strong reasoning capabilities. It supports a 128K token context length and is enhanced through supervised fine-tuning and direct preference optimization to ensure precise instruction following and safety.

Large Language Model

Transformers Supports Multiple Languages

Phi 4 Mini Instruct

Phi-4-mini-instruct is a lightweight open-source model built on synthetic data and filtered public web data, focusing on high-quality, reasoning-rich data. It supports a 128K token context length and multilingual processing.

Large Language Model

Transformers Supports Multiple Languages

Mistral Small 24B Instruct 2501 GGUF

GGUF quantized version of Mistral-Small-24B-Instruct-2501, suitable for local deployment and text generation tasks.

Large Language Model

Rank Zephyr 7b V1 Full GGUF

This is the GGUF quantized version of the castorini/rank_zephyr_7b_v1_full model, designed for text ranking tasks.

Large Language Model English

Llama 3.2 3B Instruct Abliterated GGUF

An optimized quantized model where output and embedding tensors use f16 format, while other tensors use q5_k or q6_k format, resulting in a smaller size with performance comparable to pure f16.

Large Language Model English

T5 Large Q4 K M GGUF

This model is a GGUF-converted version of google-t5/t5-large, supporting tasks like summarization and translation, and is applicable to multiple languages including English, French, Romanian, German, and more.

Large Language Model Supports Multiple Languages

3danimationdiffusion V10 GGUF

A 3D animation-style text-to-image generation model based on Stable Diffusion technology, supporting Disney and anime-style 3D image generation.

Image Generation English

Phi 3.5 Mini Instruct Uncensored GGUF

Phi-3.5-mini-instruct_Uncensored is a quantized language model suitable for use under various hardware conditions.

Large Language Model

Stable Diffusion V1 5 GGUF

Stable Diffusion v1.5 is a text-to-image generation model capable of producing high-quality images based on textual descriptions.

Image Generation

Phi 3 Vision 128k Instruct

Phi-3-Vision-128K-Instruct is a lightweight, cutting-edge open multimodal model supporting a 128K token context length, focusing on high-quality reasoning in text and visual domains.

Transformers Other

Phi 3 Small 8k Instruct

Phi-3-Small-8K-Instruct is a 7B-parameter lightweight open-source model focused on high-quality reasoning capabilities, supporting 8K context length, suitable for commercial and research applications in English environments.

Large Language Model

Transformers Other

Phi 3 Medium 4k Instruct

Phi-3-Medium-4K-Instruct is a 14-billion-parameter lightweight open-source model focusing on high-quality reasoning capabilities, supporting 4K context length, suitable for commercial and research purposes in English environments.

Large Language Model

Transformers Other

Vecteus V1 Gguf

GGUF format version of Vecteus-v1, supporting English and Japanese text generation

Large Language Model Supports Multiple Languages

Local-Novel-LLM-project

Phi 3 Mini 4k Instruct GGUF

Phi-3-Mini-4K-Instruct is a 3.8 billion parameter lightweight cutting-edge open-source model trained on the Phi-3 dataset, emphasizing high quality and inference-intensive characteristics.

Large Language Model

Phi 3 Mini 4k Instruct Gguf

Phi-3-Mini-4K-Instruct is a lightweight, cutting-edge open-source model with 3.8 billion parameters, focusing on high quality and inference-intensive features, suitable for commercial and research use in English.

Large Language Model Supports Multiple Languages

Phi 3 Mini 128k Instruct

Phi-3 Mini 128K Instruct is a 3.8B parameter lightweight open-source model focused on reasoning capabilities, supporting 128K context length.

Large Language Model

Transformers Supports Multiple Languages

Phi 3 Mini 4k Instruct

Phi-3 Mini-4K-Instruct is a lightweight, state-of-the-art open-source model with 3.8 billion parameters, specifically emphasizing high quality and dense reasoning capabilities.

Large Language Model

Transformers Supports Multiple Languages

Slimplm Query Rewriting

A lightweight language model for query rewriting, capable of parsing user input into structured formats to optimize retrieval effectiveness.

Large Language Model

phixtral-2x2_8 is the first Mixture of Experts (MoE) model built upon two microsoft/phi-2 models, outperforming each individual expert model.

Large Language Model

Transformers Supports Multiple Languages

Nekomata 14b Instruction Gguf

This model is the GGUF version of rinna/nekomata-14b-instruction, compatible with llama.cpp for lightweight inference.

Large Language Model Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase